askvity

What is GCF and GCA?

Published in Genomic Identifiers 2 mins read

GCF and GCA refer to specific identifiers used in genomic databases, specifically GenBank and RefSeq. Let's break down what each represents:


Understanding GCA

  • GCA (GenBank Assembly): This refers to an assembly record initially submitted by a researcher or institution to GenBank, a public repository for nucleotide sequences.
    • The submitter owns the GCA record.
    • GCA records are archival, meaning they are preserved in their original submitted form.
    • Annotation in a GCA record is optional; it may or may not contain annotations of genes or other genomic features.


Understanding GCF

  • GCF (RefSeq Genome Assembly): This represents an NCBI-derived copy of a submitted GCA assembly.
    • NCBI creates and maintains GCF records based on a submitted GCA.
    • GCF records usually have curated annotations, meaning NCBI staff will often add or standardize annotations.
    • GCF records are maintained by NCBI, and they represent a controlled reference set of genomic data.


Key Differences Summarized

Feature GCA (GenBank Assembly) GCF (RefSeq Genome Assembly)
Origin Directly submitted Derived from a submitted GCA
Ownership Submitter NCBI
Annotation Optional Usually curated by NCBI
Maintenance Submitter NCBI
Purpose Archival record Reference genome assembly


Practical Example

Imagine a research team sequences a new bacterial genome. They submit it to GenBank, resulting in a GCA record. NCBI then uses this GCA record to generate a corresponding GCF record, which usually includes NCBI's curated annotations. Researchers can then reference the GCF record for downstream analysis, with confidence in the quality of its annotation.


In Simple Terms

  • GCA is like the original submitted data, potentially including user-made annotations.
  • GCF is like a curated, standardized copy made by NCBI for use as a reference.


Related Articles