Genomics is the science of genomes, the genetic material that contains our hereditary information. A living organism’s genome contains the genes that encode the proteins that make up its body and help to determine its characteristics. Physically, a genome is just strings of DNA, themselves made of repeats of four different nucleotide building blocks, labelled A,C,G and T. Genomes can be many millions to many thousands-of-millions of nucleotides long.
The study of genomes begins by generating genome sequence, that is, working out the order of all the millions of A.C,G and T in the DNA. This is done by extracting DNA from an organism (usually as simple as mashing up a leaf or blood sample in some chemicals) and putting it through a sequencing machine. These machines are only good enough to give us fragments of sequence a few hundred nucleotides long, but they are the raw data on which we work. Once we have these short reads of parts of the sequence the exciting part of the genomics can begin.
Assembly is the set of processes by which we take the short reads and create larger contiguous sequences (contigs) from them. These aren’t always complete, we don’t always get everything into the one big sequence it originally was, or always know the order that contigs should be in. When we have only a fragmented assembly, we call it a draft assembly.
The genes are the main functional units in the genome, but there are lots of other scientifically interesting and important features in the genome that we would want to identify. The process of detecting genes and other genomic features and classifying them is called annotation.
When genes are activated a number of copies of the gene called transcripts. Made of an analogous chemical RNA. RNA can easily be converted to DNA and sequenced in the same way as the genome. The abundance of transcripts of the same gene can reflect the level it’s activity. Quantifying the abundance of transcripts is one part of transcriptomics.
Individual genes can produce different transcripts. Different sub-sections of each gene are used to make the different classes or isoforms of transcript. Identifying the different isoforms present is another goal of transcriptomics experiments.
When we have more than one genome we can start to look for similarities and differences between them, collectively this is called comparative genomics.